19 research outputs found

    Radar and video as the perfect match : a cooperative method for sensor fusion

    Get PDF
    Accurate detection and tracking of road users is essential for driverless cars and many other smart mobility applications. As no single sensor can provide the required accuracy and robustness, the output from several sensors needs to be combined. Especially radar and video are a good match, because their weaknesses and strengths complement each other. Researchers from IPI – an imec research group at Ghent University – developed a new technique to optimize radar-video fusion by exchanging information at an earlier stage

    Human gesture classification by brute-force machine learning for exergaming in physiotherapy

    Get PDF
    In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods

    Robust pan/tilt compensation for foreground-background segmentation

    Get PDF
    In this paper, we describe a robust method for compensating the panning and tilting motion of a camera, applied to foreground-background segmentation. First, the necessary internal camera parameters are determined through feature-point extraction and tracking. From these parameters, two motion models for points in the image plane are established. The first model assumes a fixed tilt angle, whereas the second model allows simultaneous pan and tilt. At runtime, these models are used to compensate for the motion of the camera in the background model. We will show that these methods provide a robust compensation mechanism and improve the foreground masks of an otherwise state-of-the-art unsupervised foreground-background segmentation method. The resulting algorithm is always able to obtain F1 scores above 80% on every daytime video in our test set when a minimal number of only eight feature matches are used to determine the background compensation, whereas the standard approaches need significantly more feature matches to produce similar results

    Camera-based system for drafting detection while cycling

    Get PDF
    Drafting involves cycling so close behind another person that wind resistance is significantly reduced, which is illegal during most long distance and several short distance triathlon and duathlon events. In this paper, a proof of concept for a drafting detection system based on computer vision is proposed. After detecting and tracking a bicycle through the various scenes, the distance to this object is estimated through computational geometry. The probability of drafting is then determined through statistical analysis of subsequent measurements over an extended period of time. These algorithms are tested using a static recording and a recording that simulates a race situation with ground truth distances obtained from a Light Detection And Ranging (LiDAR) system. The most accurate developed distance estimation method yields an average error of 0 . 46 m in our test scenario. When sampling the distances at periods of 1 or 2 s, simulations demonstrate that a drafting violation is detected quickly for cyclists riding at 2 m or more below the limit, while generally avoiding false positives during the race-like test set-up and five hour race simulations

    Single camera based visual tracking and multilayered scene modelling

    No full text
    The most interesting information in video images is often related to moving objects. In tracking applications, one follows moving objects or people throughout a sequence of images. The system determines the position of the object of interest in each frame, which results in a track. Applications are often found in traffic, surveillance, sports analysis... Occlusion is an import issue in all tracking systems. In this case, interesting objects are partially or fully hidden behind other static or moving objects. Camera’s which can identify these occluded objects, will generate less faulty information. Consequently, tracking and occlusion are strongly related to 3D scene models. My research is focused on the development of a robust, self learning, single camera tracking system, which is based on an automatically built, multilayered scene model. Single camera systems cannot directly provide depth accurate information, but they can still gather a lot of useful depth knowledge about a scene, based on perspective effects, known planes (for example, the floor) and the analysis of occlusions. This requires a scene model which tells the user where the occluding objects are, where people can walk and cars can drive..

    EFIC: edge based foreground background segmentation and interior classification for dynamic camera viewpoints

    No full text
    Foreground background segmentation algorithms attempt to separate interesting or changing regions from the background in video sequences. Foreground detection is obviously more difficult when the camera viewpoint changes dynamically, such as when the camera undergoes a panning or tilting motion. In this paper, we propose an edge based foreground background estimation method, which can automatically detect and compensate for camera viewpoint changes. We will show that this method significantly outperforms state-of-the-art algorithms for the panning sequences in the ChangeDetection. NET 2014 dataset, while still performing well in the other categories

    Automatic annotation of pedestrians in thermal images using background/foreground segmentation for training deep neural networks

    No full text
    Deep Neural Networks for object detection have become of significant interest with the substantial improvement in their efficiency and increase in their applications. However, training such a network requires a large annotated dataset, which is very expensive in term of time and human effort. In this context, several automatic image annotation solutions have been proposed which are dependent on rich visual features and thus suitable for color images only. On the other hand, recent experiments have proven that color cameras alone are not enough for pedestrian related applications and there is a need to use other visual sensors as well, such as thermal cameras. In this paper, we propose an automatic image annotation technique for pedestrian detection in thermal images using an adaptive background/foreground estimation model to train Faster-RCNN. The results presented in this paper, obtained through longterm experiments demonstrate the efficacy of our technique. The results also show that our proposed technique is be very useful to generate image annotations automatically and to train a deep neural network without having a manually annotated dataset for cameras with different modalities

    Human action recognition using hierarchic body related occupancy maps

    No full text
    This paper introduces a novel spatial method for human action recognition that is discriminative without needing temporal information or action key poses. First, skeletal data is acquired with the Microsoft Kinect v2 sensor and undergoes a Pose Invariant Normalization (PIN) process. The PIN process translates, rotates and scales the various observed poses to eliminate body differences and positional differences between subjects. Second, the method uses a Body Related Occupancy Map (BROM), that describes in a 3D grid how the area around specific body parts is used, as a strong indicator of the particular action that is being performed. The BROM and its 2D projections are used as feature inputs for Random Forest classifiers. These classifiers are then combined in a hierarchic structure to boost the classification performance. The approach is tested on a self-captured database of 23 human actions for game-play. On this database a classification with an accuracy score of 91% is achieved for the hierarchic BROM (HiBROM) classification. On the public CAD60 dataset, the HiBROM classifier attains 87.2% accuracy which is comparable to other state-of-the-art methods

    Efficient detection of crossing pedestrians from a moving vehicle with an array of cameras

    No full text
    In this paper, we describe a novel method for detecting crossing pedestrians, and in general any object that is moving perpendicular to the driving direction of the vehicle. This is achieved by combining video snapshots from multiple cameras which are placed in a linear configuration, and from multiple time instances. We demonstrate that the proposed array configuration imposes tight constraints on the expected disparity of static objects in a certain image region for a given camera pair. These regions are distinct for different camera pairs. In that manner, static regions can generally be distinguished from moving targets throughout the entire field of view when analyzing enough pairs, requiring only straightforward image processing techniques. On a self-captured dataset with crossing pedestrians, our proposed method reaches an F1 detection score of 83.66 % and a MAP of 84.79 % on an overlap test when used stand-alone, being processed at 59 frames per second without GPU acceleration. When combining it with the Yolo V4 object detector in cooperative fusion, the proposed method boosts the maximal F1 scores of this detector on this same dataset from 87.86 % to 92.68 % and the MAP from to 90.85 % tot 94.30 %. Furthermore, combining it with the lower power Yolo-Tiny V4 detector in the same way yields F1 and MAP increases from 68.57 % to 81.16 % and 72.32 % to 85.25 % respectively
    corecore